The Gemm NN GPU is missing declaration of datatype #138#139
The Gemm NN GPU is missing declaration of datatype #138#139abouteiller merged 5 commits intoICLDisco:masterfrom
Conversation
|
Is there a way to handle this more gracefully in PaRSEC than a Segfault? |
Yes, the resultant symptom is that we have data.dst_type==NULL and data.dst_count==1 which is impossible, we could at the minimum assert. |
bosilca
left a comment
There was a problem hiding this comment.
How has this ever worked without a proper datatype and arena ? The puzzling part if that no GEMM nor POTRF has a default tile type.
|
in PO and GEMM_SUMMA/GEMM we use the Also has been reported by J John #67 but we didn't follow on it because we would not see it ourselves. |
|
The arenas in the data collection are not linked to the arenas in the taskpool, and that datatype is only used for send, once we have a valid data copy. When we receive a data and we create the local datacopy that is not attached to any data collection, we extract the arena and the datatype from the taskpool, and right now these are NULL. So how are we receiving anything ? |
|
(from slack, for posterity) dplasmajdf_lapack_dtt.h:61, we don’t hit the arena_datatypes in the taskpool in this pathway. This interacts with how we set the type_remote in the JDF (ADT_DC), we read the types from a hashtable then. |
|
We discussed on slack that it would be good to unify the ADTT_READ-hashtable and the arena_datatypes array mechanics, but I don't have time ATM for this so later. |
e522b6a to
74ab1a6
Compare
Co-authored-by: George Bosilca <bosilca@icl.utk.edu> Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
74ab1a6 to
272114e
Compare
…sue #138